A CCG-based Quality Estimation Metric for Statistical Machine Translation
نویسندگان
چکیده
We describe a metric for estimating the quality of Statistical Machine Translation (SMT) output based on syntactic features extracted using Combinatory Categorial Grammar (CCG). CCG has been demonstrated to be better suited to deal with SMT texts than context free phrase structure grammar formalisms. We use CCG features to estimate the grammaticality of the translations by dividing them into maximal grammatical chunks extracted from their CCG parse chart. We compare the performance of our CCG features with strong baseline and linguistic feature sets on French–English and Arabic–English data sets annotated with various quality scores. The results show that our CCG features outperform the baseline and linguistic features in most of the experiments. Furthermore, we demonstrate that our CCG features complement other types of features: combining CCG features with the baseline and other linguistic features furthers their performance.
منابع مشابه
CCG Supertags in Factored Statistical Machine Translation
Combinatorial Categorial Grammar (CCG) supertags present phrase-based machine translation with an opportunity to access rich syntactic information at a word level. The challenge is incorporating this information into the translation process. Factored translation models allow the inclusion of supertags as a factor in the source or target language. We show that this results in an improvement in t...
متن کاملSupertags as Source Language Context in Hierarchical Phrase-Based SMT
Statistical machine translation (SMT) models have recently begun to include source context modeling, under the assumption that the proper lexical choice of the translation for an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features have been explored as effective source context to improve phrase selection in SMT. In the present w...
متن کاملExtending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT
In this paper, we describe two approaches to extending syntactic constraints in the Hierarchical Phrase-Based (HPB) Statistical Machine Translation (SMT) model using Combinatory Categorial Grammar (CCG). These extensions target the limitations of previous syntax-augmented HPB SMT systems which limit the coverage of the syntactic constraints applied. We present experiments on Arabic–English and ...
متن کاملCCG-based Models for Statistical Machine Translation
The arguably best performing statistical machine translation systems are based on context-free formalisms or weakly equivalent ones. These models usually use a synchronous version of a context-free grammar (SCFG) which we argue is too rigid for the highly ambiguous task of human language translation. This is exacerbated by the fact that the imperfect methods available for aligning parallel text...
متن کاملCCG-Augmented Hierarchical Phrase-Based Statistical Machine Translation
xvii Acknowledgements xix
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013